Go to https://github.com/uc-cfss/dataviz for the course site. This contains the course objectives, required readings, schedules, slides, etc.
Enrollment in the course is relatively small (10ish students at last count). The nice thing about having a small class is that I can tailor it to better meet your interests. The first six weeks or so of the course are pretty much set, however in the second half of the course we can customize it more to fit your interests and needs. For that reason, I’d like each of you to go to this issue on the course repo and share your thoughts on what you’d like to learn more about in the second half of the term. I have a tentative schedule to which we certainly can stick, but I am open to modifications if there are topics of interest to a substantial portion of the class.
A visualization is “any kind of visual representation of information designed to enable communication, analysis, discovery, exploration, etc.”1 However what you seek to communicate can vary widely depending on your goals, and therefore effects the type of visualization you should design.
With information visualization, the goal is to visually depict abstract data that has no inherent physical form, as opposed to scientific visualization whereby the data itself are objects (in 1D, 2D, or 3D space). This data can be numerical (continuous or discrete), categorical, temporal, geospatial, text, etc. The purpose is to convey abstract data accurately, reveal the underlying structure in the data, and (potentially) encourage exploration of the data via an interactive element. Importantly, the visualization should also be aesthetically pleasing.
Alternatively, statistical graphics seek to visualize abstract data typically of the quantitative form. The goal is to convey data accurately and reveal the underlying structure, but are generally not explorative and interactive and may not always yield an aesthetically pleasing form.
Scatterplot matrix of the Credit dataset. Source: An Introduction to Statistical Learning: With Applications in R.
Double-time bar chart of crime in the city of San Francisco, 2009-10. Source: Visualizing Time with the Double-Time Bar Chart
Double-time bar chart of crime in the city of San Francisco, 2009-10. Source: Visualizing Time with the Double-Time Bar Chart
ggplot2 language)Information dashboards are popular in business and industry. They visualize abstract data, frequently (though not always) over time. The goal is to convey large amounts of information quickly and identify outliers and trends. The downside is that they can become extremely dense.
Dashboard for student performance. Source: 2012 Perceptual Edge Dashboard Design Competition: We Have a Winner!
Fitbit dashboard. Source: me
Infographics depict abstract data in an effort to be eye-catching and capture attention, and convey information quickly. Unfortunately they are frequently not accurate, do not use space efficiently, and may not encourage exploration of the data.
Extremely sexual sun stroking. Source: The top 10 worst infographics ever created
Source: WTF Visualizations
Informative art visualizes abstract data in an effort to make visualization ambient or a part of everyday life. The goal is to aesthetically please the audience, not to be informative.
At this point in time the theory of bacteria was not widely accepted by the medical community or the public.2 A mother washed her baby’s diaper in a well in 1854 in London, sparking an outbreak of cholera, an intestinal disease that causes vomiting, diarrhea, and eventually death. This disease had presented itself previously in London but its cause was still unknown. Dr. John Snow lived in Soho, the suburb of London where the disease manifested in 1854, and wanted to understand how cholera spreads through a population (an early day epidemiologist). Snow recorded the location of individuals who contracted cholera, including their places of residence and employment. He used this information to draw a map of the region, recording the location of individuals who contracted the disease. They seemed to be clustered around the well pump along Broad Street. Snow used this map to deduce the source of the outbreak was the well, along the way ruling out other causes by noting individuals who lived in the area and did not contract cholera, identifying that these individuals did not drink from the well. Based on this information, the government removed the handle from the well pump so the public could not draw water from it. As a result, the cholera epidemic ended.
This illustration is identifed in Edward Tufte’s The Visual Display of Quantitative Information as one of “the best statistical drawings ever created”. It also demonstrates a very important rule of warfare: never invade Russia in the winter. In 1812, Napoleon ruled most of Europe. He wanted to seize control of the British islands, but could not overcome the UK defenses. He decides to impose an embargo to weaken the nation in preparation for invasion, but Russia refused to participate. Angered at this decision, Napoleon launched an invasion of Russia with over 400,000 troops in the summer of 1812. Russia is unable to defeat Napoleon in battle, but instead waged a war of attrition. The Russian army was in near constant retreat, burning or destroying anything of value along the way to deny France usable resources. While Napoleon’s army maintained the military advantage, his lack of food and the emerging European winter decimated his forces. He left France with an army of approximately 422,000 soldiers; he returned to France with just 10,000.
Charles Minard’s map is a stunning achievement for his era. It incorporates data across six dimensions to tell the story of Napoleon’s failure. The graph depicts:
What makes this such an effective visualization?3
Data maps were one of the first data visualizations, though it took thousands of years after the first cartographic maps before data maps came together.
Split into pairs and assess this graphic.